Skip to content

Conversation

@fitzjalen
Copy link

Summary

This PR introduces a MessageHistory abstraction on the agent run context so hooks can inspect and modify the effective conversation history seen by the model. Hooks can now inject additional messages, override the next turn’s input, and keep the transcript aligned with tool calls and streaming runs.

Motivation

Hooks already let users observe the agent lifecycle, but they had no structured way to:

  • see the full conversation as the model sees it,
  • inject extra messages at precise points (e.g. before/after tools),
  • or override the next model call input without rebuilding prompts by hand.

With MessageHistory, you can implement patterns such as:

  • safety or policy layers that inject moderator / developer guidance,
  • UX helpers that tell the LLM how many steps it has left or what phase of a workflow it is in,
  • domain assistants that add contextual hints or metadata right before a tool call,
  • teaching/tutoring flows that annotate tool results with explanations.

Changes

  • Added MessageHistory (src/agents/message_history.py):
    • Tracks original input, generated RunItems and pending injected messages.
    • Exposes:
      • get_messages() to snapshot the current transcript,
      • add_message(...) to queue injected messages as InjectedInputItems,
      • override_next_turn(...) to replace the next model call’s input,
      • stage markers around hooks and tools (agent_start, before_llm, after_llm, before_tool, after_tool).
  • Extended RunContextWrapper with message_history, available in both RunHooks and AgentHooks.
  • Integrated MessageHistory into AgentRunner:
    • Binds original input and generated items so hooks see the same history as the model.
    • Applies optional next-turn overrides (disallowed when using conversation_id / previous_response_id).
    • Appends pending injected messages to the model input.
    • Uses stage metadata to insert InjectedInputItems before/after tool calls and around LLM calls.
  • Added InjectedInputItem to RunItem so injected messages show up in new_items and streaming events.

Example

A custom set of hooks can now:

  • inject a developer message at agent start (e.g. global policy),
  • add guidance before each LLM call (e.g. “only call one shell command at a time”, “you have 3 steps left”),
  • add messages before and after each tool call (e.g. explain what the tool is doing or how to interpret its result).

For instance, a ShellTool example uses context.message_history.add_message(...) in on_start, on_llm_start, on_tool_start and on_tool_end to:

  • enforce serialized shell usage,
  • add instructions on how to create a CSV file,
  • and surface all injected developer messages in result.to_input_list() alongside tool calls and outputs.

Tests

  • tests/test_message_history.py:
    • test_run_hooks_can_inject_messages_into_llm_input
    • test_streamed_runs_emit_injected_input_items
    • test_injected_messages_preserve_order_around_tool_calls

All tests pass with make tests.

Usage example:

import asyncio
import os
from pathlib import Path

from agents import (
    Agent,
    AgentHooks,
    ModelSettings,
    Runner,
    RunContextWrapper,
    ShellCallOutcome,
    ShellCommandOutput,
    ShellCommandRequest,
    ShellResult,
    ShellTool,
    trace,
    Tool
)


class CustomAgentHooks(AgentHooks):
    def __init__(self, display_name: str):
        self.event_counter = 0
        self.display_name = display_name

    async def on_start(self, context, agent):
        self.event_counter += 1
        print(f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} started.")
        context.message_history.add_message(
            agent=agent,
            message={
                    "role": "developer",
                    "content": f"This is developer message added at agent start {self.event_counter}.",
            },
        )


    async def on_llm_start(self, context, agent, system_prompt, input_items):
        context.message_history.add_message(
            agent=agent,
            message={
                    "role": "developer",
                    "content": (
                        f"This is developer message added before LLM call {self.event_counter}. "
                        "**Always call only one shell command at a time.** That means call shell for `ls` first, "
                        "wait for the result, then call shell for `date`, wait for the result, and so on. "
                        "YOU MUST FOLLOW THIS RULE STRICTLY!"
                    )
            },
        )

    async def on_llm_end(self, context, agent, response):
        context.message_history.add_message(
            agent=agent,
            message={
                    "role": "developer",
                    "content": f"This is developer message added after LLM call {self.event_counter}.",
            },
        )

    async def on_tool_start(
        self, context: RunContextWrapper, agent: Agent, tool: Tool
    ) -> None:
        self.event_counter += 1
        print(
            f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} started tool {tool.name}"
        )

        context.message_history.add_message(
            agent=agent,
            message={
                    "role": "developer",
                    "content": f"This is developer message added before tool {tool.name} execution.",
            },
        )

    async def on_tool_end(
        self, context: RunContextWrapper, agent: Agent, tool: Tool, result: str
    ) -> None:
        self.event_counter += 1
        print(
            f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} ended tool {tool.name} with result {result}"
        )

        context.message_history.add_message(
            agent=agent,
            message={
                    "role": "developer",
                    "content": (
                        f"This is developer message added after tool {tool.name} execution. "
                        "If the user ask you to create a tiny csv file with some data in it, "
                        "create a csv file with columns 'id' and 'value', and add 3 rows of sample data."
                    )
            },
        )

class ShellExecutor:
    """Executes shell commands with optional approval."""

    def __init__(self, cwd: Path | None = None):
        self.cwd = Path(cwd or Path.cwd())

    async def __call__(self, request: ShellCommandRequest) -> ShellResult:
        action = request.data.action

        outputs: list[ShellCommandOutput] = []
        for command in action.commands:
            proc = await asyncio.create_subprocess_shell(
                command,
                cwd=self.cwd,
                env=os.environ.copy(),
                stdout=asyncio.subprocess.PIPE,
                stderr=asyncio.subprocess.PIPE,
            )
            timed_out = False
            try:
                timeout = (action.timeout_ms or 0) / 1000 or None
                stdout_bytes, stderr_bytes = await asyncio.wait_for(
                    proc.communicate(), timeout=timeout
                )
            except asyncio.TimeoutError:
                proc.kill()
                stdout_bytes, stderr_bytes = await proc.communicate()
                timed_out = True

            stdout = stdout_bytes.decode("utf-8", errors="ignore")
            stderr = stderr_bytes.decode("utf-8", errors="ignore")
            outputs.append(
                ShellCommandOutput(
                    command=command,
                    stdout=stdout,
                    stderr=stderr,
                    outcome=ShellCallOutcome(
                        type="timeout" if timed_out else "exit",
                        exit_code=getattr(proc, "returncode", None),
                    ),
                )
            )

            if timed_out:
                break

        return ShellResult(
            output=outputs,
            provider_data={"working_directory": str(self.cwd)},
        )


with trace("shell_example"):
    agent = Agent(
        name="Shell Assistant",
        model="gpt-5.1",
        instructions=(
            "You can run shell commands using the shell tool. "
            "Keep responses concise and include command output when helpful."
        ),
        tools=[ShellTool(executor=ShellExecutor())],
        model_settings=ModelSettings(
            tool_choice="required",
            parallel_tool_calls=False
        ),
        hooks=CustomAgentHooks(display_name="Test Agent")
    )

    prompt = \
"""First, list files in the current directory using 'ls' command.
Then, show the current date using 'date' command.
After that create python file that creates a tiny csv file with some data in it.
Finally, show the data from the created csv file using 'cat' command. 
Provide results of each command in your response."""

    result = await Runner.run(agent, prompt)
    print(f"\nFinal response:\n{result.final_output}")

now result contains:

[{'content': "First, list files in the current directory using 'ls' command.\nThen, show the current date using 'date' command.\nAfter that create python file that creates a tiny csv file with some data in it.\nFinally, show the data from the created csv file using 'cat' command. \nProvide results of each command in your response.",
  'role': 'user'},
 {'role': 'developer',
  'content': 'This is developer message added at agent start 1.'},
 {'role': 'developer',
  'content': 'This is developer message added before LLM call 1. **Always call only one shell command at a time.** That means call shell for `ls` first, wait for the result, then call shell for `date`, wait for the result, and so on. YOU MUST FOLLOW THIS RULE STRICTLY!'},
 {'id': 'sh_0a6f7ea4569ac7de006924701d93bc819b8d5547a44df8c727',
  'action': {'commands': ['ls'],
   'max_output_length': 8802,
   'timeout_ms': None},
  'call_id': 'call_tskqGdnNwqunPxK9BRy8NRTP',
  'status': 'completed',
  'type': 'shell_call'},
 {'role': 'developer',
  'content': 'This is developer message added before tool shell execution.'},
 {'type': 'shell_call_output',
  'call_id': 'call_tskqGdnNwqunPxK9BRy8NRTP',
  'output': [{'stdout': 'sandbox.ipynb\n',
    'stderr': '',
    'outcome': {'type': 'exit', 'exit_code': 0}}]},
 {'role': 'developer',
  'content': "This is developer message added after tool shell execution. If the user ask you to create a tiny csv file with some data in it, create a csv file with columns 'id' and 'value', and add 3 rows of sample data."},
 {'role': 'developer',
  'content': 'This is developer message added after LLM call 1.'},
 {'role': 'developer',
  'content': 'This is developer message added before LLM call 3. **Always call only one shell command at a time.** That means call shell for `ls` first, wait for the result, then call shell for `date`, wait for the result, and so on. YOU MUST FOLLOW THIS RULE STRICTLY!'},
 {'id': 'sh_0a6f7ea4569ac7de006924701ef96c819ba49e872f8ff7bc96',
  'action': {'commands': ['date'],
   'max_output_length': 10240,
   'timeout_ms': None},
  'call_id': 'call_xpP3LmBozLhQtClp9HmOVJUm',
  'status': 'completed',
  'type': 'shell_call'},
 {'role': 'developer',
  'content': 'This is developer message added before tool shell execution.'},
 {'type': 'shell_call_output',
  'call_id': 'call_xpP3LmBozLhQtClp9HmOVJUm',
  'output': [{'stdout': 'Mon Nov 24 17:48:00 +03 2025\n',
    'stderr': '',
    'outcome': {'type': 'exit', 'exit_code': 0}}]},
 {'role': 'developer',
  'content': "This is developer message added after tool shell execution. If the user ask you to create a tiny csv file with some data in it, create a csv file with columns 'id' and 'value', and add 3 rows of sample data."},
 {'role': 'developer',
  'content': 'This is developer message added after LLM call 3.'},
 {'role': 'developer',
  'content': 'This is developer message added before LLM call 5. **Always call only one shell command at a time.** That means call shell for `ls` first, wait for the result, then call shell for `date`, wait for the result, and so on. YOU MUST FOLLOW THIS RULE STRICTLY!'},
 {'id': 'sh_0a6f7ea4569ac7de006924702223d0819bb1aeb94202d41b7c',
  'action': {'commands': ['python - << \'EOF\'\nimport csv\n\nfilename = \'data.csv\'\n\nrows = [\n    {\'id\': 1, \'value\': \'alpha\'},\n    {\'id\': 2, \'value\': \'beta\'},\n    {\'id\': 3, \'value\': \'gamma\'},\n]\n\nwith open(filename, \'w\', newline=\'\') as csvfile:\n    writer = csv.DictWriter(csvfile, fieldnames=[\'id\', \'value\'])\n    writer.writeheader()\n    writer.writerows(rows)\n\nprint(f"Created {filename} with sample data.")\nEOF'],
   'max_output_length': 10240,
   'timeout_ms': None},
  'call_id': 'call_jbIEb3453E4CAFrFwdYDxkSd',
  'status': 'completed',
  'type': 'shell_call'},
 {'role': 'developer',
  'content': 'This is developer message added before tool shell execution.'},
 {'type': 'shell_call_output',
  'call_id': 'call_jbIEb3453E4CAFrFwdYDxkSd',
  'output': [{'stdout': 'Created data.csv with sample data.\n',
    'stderr': '',
    'outcome': {'type': 'exit', 'exit_code': 0}}]},
 {'role': 'developer',
  'content': "This is developer message added after tool shell execution. If the user ask you to create a tiny csv file with some data in it, create a csv file with columns 'id' and 'value', and add 3 rows of sample data."},
 {'role': 'developer',
  'content': 'This is developer message added after LLM call 5.'},
 {'role': 'developer',
  'content': 'This is developer message added before LLM call 7. **Always call only one shell command at a time.** That means call shell for `ls` first, wait for the result, then call shell for `date`, wait for the result, and so on. YOU MUST FOLLOW THIS RULE STRICTLY!'},
 {'id': 'sh_0a6f7ea4569ac7de00692470246b3c819ba5fff00211b35d52',
  'action': {'commands': ['cat data.csv'],
   'max_output_length': 10240,
   'timeout_ms': None},
  'call_id': 'call_kAjmrvuXvuK6tHtz4TLIxun7',
  'status': 'completed',
  'type': 'shell_call'},
 {'role': 'developer',
  'content': 'This is developer message added before tool shell execution.'},
 {'type': 'shell_call_output',
  'call_id': 'call_kAjmrvuXvuK6tHtz4TLIxun7',
  'output': [{'stdout': 'id,value\r\n1,alpha\r\n2,beta\r\n3,gamma\r\n',
    'stderr': '',
    'outcome': {'type': 'exit', 'exit_code': 0}}]},
 {'role': 'developer',
  'content': "This is developer message added after tool shell execution. If the user ask you to create a tiny csv file with some data in it, create a csv file with columns 'id' and 'value', and add 3 rows of sample data."},
 {'role': 'developer',
  'content': 'This is developer message added after LLM call 7.'},
 {'role': 'developer',
  'content': 'This is developer message added before LLM call 9. **Always call only one shell command at a time.** That means call shell for `ls` first, wait for the result, then call shell for `date`, wait for the result, and so on. YOU MUST FOLLOW THIS RULE STRICTLY!'},
 {'id': 'msg_0a6f7ea4569ac7de0069247025b014819bbb4eaf0bc556cbef',
  'content': [{'annotations': [],
    'text': 'Here are the results of each step:\n\n1. `ls`\n   ```\n   sandbox.ipynb\n   ```\n\n2. `date`\n   ```\n   Mon Nov 24 17:48:00 +03 2025\n   ```\n\n3. Python script that creates the CSV file (executed via shell):\n   ```python\n   import csv\n\n   filename = \'data.csv\'\n\n   rows = [\n       {\'id\': 1, \'value\': \'alpha\'},\n       {\'id\': 2, \'value\': \'beta\'},\n       {\'id\': 3, \'value\': \'gamma\'},\n   ]\n\n   with open(filename, \'w\', newline=\'\') as csvfile:\n       writer = csv.DictWriter(csvfile, fieldnames=[\'id\', \'value\'])\n       writer.writeheader()\n       writer.writerows(rows)\n\n   print(f"Created {filename} with sample data.")\n   ```\n   Command output:\n   ```\n   Created data.csv with sample data.\n   ```\n\n4. `cat data.csv`\n   ```\n   id,value\n   1,alpha\n   2,beta\n   3,gamma\n   ```',
    'type': 'output_text',
    'logprobs': []}],
  'role': 'assistant',
  'status': 'completed',
  'type': 'message'},
 {'role': 'developer',
  'content': 'This is developer message added after LLM call 9.'}]

Copilot AI review requested due to automatic review settings November 24, 2025 14:56
Copilot finished reviewing on behalf of fitzjalen November 24, 2025 15:01
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a MessageHistory abstraction to the agent run context, enabling hooks to inspect and modify the conversation history visible to the model. This allows developers to inject additional messages, override turn inputs, and implement sophisticated patterns like safety layers, workflow guidance, and teaching flows.

Key changes:

  • Added MessageHistory class with methods to track, inject, and override conversation messages at specific lifecycle stages
  • Integrated message history into both RunHooks and AgentHooks via RunContextWrapper.message_history
  • Modified agent runner to apply injected messages and overrides while maintaining proper ordering around tool calls

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/agents/message_history.py New core abstraction for tracking and manipulating conversation history with stage-aware injection
src/agents/run_context.py Added message_history field to RunContextWrapper for hook access
src/agents/run.py Integrated message history lifecycle: binding inputs/items, consuming overrides, inserting injected items with proper ordering
src/agents/_run_impl.py Wrapped tool hooks with injection stage markers and updated streaming to emit InjectedInputItem events
src/agents/items.py Added InjectedInputItem type to represent hook-injected messages in run results
src/agents/__init__.py Exported new InjectedInputItem and MessageHistory types
tests/test_message_history.py Added comprehensive tests for message injection, streaming, and ordering guarantees
docs/usage.md Documented message history API with examples and limitations
docs/zh/usage.md Chinese documentation for message history feature
docs/ko/usage.md Korean documentation for message history feature
docs/ja/usage.md Japanese documentation for message history feature

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- [`Usage`][agents.usage.Usage] - 使用状況トラッキングのデータ構造
- [`RequestUsage`][agents.usage.RequestUsage] - リクエストごとの使用状況の詳細
- [`RunContextWrapper`][agents.run.RunContextWrapper] - 実行コンテキストから使用状況にアクセス
- [`MessageHistory`][agents.run_context.MessageHistory] - フックから会話履歴を閲覧・編集
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect module reference in documentation link. The link references [agents.run_context.MessageHistory] but based on the code structure, MessageHistory is defined in src/agents/message_history.py, so the reference should be [agents.message_history.MessageHistory].

Suggested change
- [`MessageHistory`][agents.run_context.MessageHistory] - フックから会話履歴を閲覧・編集
- [`MessageHistory`][agents.message_history.MessageHistory] - フックから会話履歴を閲覧・編集

Copilot uses AI. Check for mistakes.
):
for item in new_step_items:
if isinstance(item, MessageOutputItem):
if isinstance(item, MessageOutputItem) or isinstance(item, InjectedInputItem):
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code style: use isinstance(item, (MessageOutputItem, InjectedInputItem)) instead of isinstance(item, MessageOutputItem) or isinstance(item, InjectedInputItem) for better readability and performance.

Suggested change
if isinstance(item, MessageOutputItem) or isinstance(item, InjectedInputItem):
if isinstance(item, (MessageOutputItem, InjectedInputItem)):

Copilot uses AI. Check for mistakes.
return
if not self._stage_markers:
return
if self._stage_markers and self._stage_markers[-1] is marker:
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant condition check. Line 176 checks if self._stage_markers after line 174 already checked if not self._stage_markers and returned early. The second check is unnecessary.

Suggested change
if self._stage_markers and self._stage_markers[-1] is marker:
if self._stage_markers[-1] is marker:

Copilot uses AI. Check for mistakes.
- [`Usage`][agents.usage.Usage] - Usage tracking data structure
- [`RequestUsage`][agents.usage.RequestUsage] - Per-request usage details
- [`RunContextWrapper`][agents.run.RunContextWrapper] - Access usage from run context
- [`MessageHistory`][agents.run_context.MessageHistory] - Inspect or edit the conversation from hooks
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect module reference in documentation link. The link references [agents.run_context.MessageHistory] but based on the code structure, MessageHistory is defined in src/agents/message_history.py, so the reference should be [agents.message_history.MessageHistory].

Suggested change
- [`MessageHistory`][agents.run_context.MessageHistory] - Inspect or edit the conversation from hooks
- [`MessageHistory`][agents.message_history.MessageHistory] - Inspect or edit the conversation from hooks

Copilot uses AI. Check for mistakes.
- [`Usage`][agents.usage.Usage] - 用量跟踪数据结构
- [`RequestUsage`][agents.usage.RequestUsage] - 按请求的用量详情
- [`RunContextWrapper`][agents.run.RunContextWrapper] - 从运行上下文访问用量
- [`MessageHistory`][agents.run_context.MessageHistory] - 在钩子中查看或编辑对话
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect module reference in documentation link. The link references [agents.run_context.MessageHistory] but based on the code structure, MessageHistory is defined in src/agents/message_history.py, so the reference should be [agents.message_history.MessageHistory].

Suggested change
- [`MessageHistory`][agents.run_context.MessageHistory] - 在钩子中查看或编辑对话
- [`MessageHistory`][agents.message_history.MessageHistory] - 在钩子中查看或编辑对话

Copilot uses AI. Check for mistakes.
- [`Usage`][agents.usage.Usage] - 사용량 추적 데이터 구조
- [`RequestUsage`][agents.usage.RequestUsage] - 요청별 사용량 세부 정보
- [`RunContextWrapper`][agents.run.RunContextWrapper] - 실행 컨텍스트에서 사용량 접근
- [`MessageHistory`][agents.run_context.MessageHistory] - 훅에서 대화 기록을 조회/편집
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect module reference in documentation link. The link references [agents.run_context.MessageHistory] but based on the code structure, MessageHistory is defined in src/agents/message_history.py, so the reference should be [agents.message_history.MessageHistory].

Suggested change
- [`MessageHistory`][agents.run_context.MessageHistory] - 훅에서 대화 기록을 조회/편집
- [`MessageHistory`][agents.message_history.MessageHistory] - 훅에서 대화 기록을 조회/편집

Copilot uses AI. Check for mistakes.
return
try:
self._stage_markers.remove(marker)
except ValueError:
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Copilot uses AI. Check for mistakes.
@seratch seratch added enhancement New feature or request feature:core labels Nov 24, 2025
@seratch
Copy link
Member

seratch commented Nov 24, 2025

We've got similar suggestions in the past, but enabling this operation could make the whole agent app way more complicated and much harder to debug. For this reason, we still hesitate to add such a flexibility.

@seratch seratch marked this pull request as draft November 24, 2025 23:18
@fitzjalen
Copy link
Author

We've got similar suggestions in the past, but enabling this operation could make the whole agent app way more complicated and much harder to debug. For this reason, we still hesitate to add such a flexibility.

Thank you for your response! I agree that it largely depends on the use case. In my situation, this is already the second project in a row where this feature becomes absolutely essential.

Previously, I ran into several issues, for example:
• how to dynamically instruct the agent about the remaining number of steps;
• how to guide the agent on interpreting the output of certain tools;
• how to handle context-window limits — e.g., asking the agent to summarize, validate checklists, and continue execution once the window gets close to full;
and a few others.

Technically, some of this could be solved by patching tool outputs, but in practice the model does not reliably follow such patches. More importantly, this kind of patching becomes an additional layer of complexity that actually makes debugging harder — exactly the concern you mentioned.

I experimented with injecting developer messages through a hook, and in my case it significantly improved the agent’s stability and controllability. My use case involves long-running agents, so perhaps this feature is primarily valuable in scenarios like these.

Nevertheless, I appreciate your review. I’ll be glad if at some point you decide this feature could be useful for your own cases as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request feature:core

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants